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Abstract 

We show that the Gamma distribution is not an adequate fit for the 
probability density function of drop diameters using the Kolmogorov- 
Smirnov goodness of fit test. We propose a different parametriza- 
tion of drop size distributions, which not depending by any particular 
functional form, is based on the adoption of standardized central mo- 
ments. The first three standardized central moments are sufficient to 
characterize the distribution of drop diamters at the ground. These 
parameters together with the drop count form a 4-tuple which fully 
describe the variability of the drop size distributions. The Cartesian 
product of this 4-tuple of parameters is the rainfall phase space. Using 
disdrometer data from 10 different locations we identify invariant, not 
depending on location, properties of the rainfall phenomenon. 

1 Introduction 

At "punctual" space scale (~50 cm 2 ) rain can be described by a stochastic 
sequence of couples (Dj,Tj): Dj being the diameter of the j-th drops, and 
Tj the interval of time between the arrival of the j-th drop and the (j + 
l)-th drop (j = 1,2,3,...). Partitioning the time axis in sampling time 
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intervals of equal duration it is possible to build from the sequence (Dj,Tj) 
the sequence (pk(D), N k ), with Pk(D) the probability density function of 
drop diameter, and Nk the number of drops observed at the ground during 
the l-th sampling time interval (I = 1,2,....). These two last quantities 
are usually measured by disdrometers with a sampling time in the range 
10s-5min. Observational evidences suggest that rain is a non- homogeneous 
process as the variability observed in the sequence (pi(D), N t ) cannot simply 
ascribed to the variability expected when sampling from a single stochastic 
process [211 HU El E] • The sequence (Dj,Tj), and thus (pi(D), Ni), is non- 
stationary: the "statistical rules" to which the couples (Dj,Tj) obey are not 
invariant under time translation |2"3] . 

A proper description of the distribution of drop sizes, which is funda- 
mental for understanding the microphysics involved in the mechanisms of 
precipitation formation and retrieving rainfall from radar sensors, has been 
the main research goal since the introduction of the impact disdrometer has 
made reliable and prolonged measurements of drop diameters feasible. The 
main tool the meteorological community uses to describe the variability of the 
rainfall phenomenon is the drop size distribution, namely the concentration 
M{D) per unit volume and diameter. In their seminal work Marshall and 
Palmer [15J proposed the exponential as functional dependence for M{D) 
with a decaying constant depending on the rainfall rate. Later, Joss and 
Gori [9] made clear that the exponential form is the results of sampling long 
time intervals (~30 minutes), and at smaller time intervals (~1 minute) the 
drop size distribution does not have an exponential functional dependence. 
Ulbrich [25J proposed the Gamma distribution as the functional form for the 
drop size distributions sampled at short time intervals. The rationale for 
this choice is that while an exponential decay is observed for the right tail 
of the distribution (recently [25] have shown that break-up due to air vis- 
cosity is the main mechanism underlying the occurrence of exponential right 
tail), the density at smaller diameters is smaller than that of an exponential 
distribution. The Gamma distribution has been the only choice adopted in 
literature (if one excludes sporadic attempts with the log-Normal distribu- 
tion, see e.g. [3J, [IS], and [TF]) mainly because of two properties: 1) it is 
defined by two parameters, and 2) its moments can easily calculated so that 
any rain bulk variable can easily expressed in terms of the parameters of the 
Gamma distribution. The Gamma distribution has been widely accepted 
by radar meteorology and cloud physics communities, even if measurements 
show that the Gamma distribution is not general enough to represent ade- 
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quately the full range of the sample variability, [TT]. The fit accuracy of the 
Gamma distribution to disdrometer data has not been properly addressed 
(aside from subjective statements: "a good fit" or "a reasonable fit" not sup- 
ported by any objective measurement) or mistakenly addressed (see Section 
H|) as in [20]. In the present work, we consider data from 10 different sites, 
and use the Kolmogorov-Smirnov goodness of fit test (e.g. [HI1EE2]) t° show 
that the Gamma distribution is a poor fit to 1 minute sampled drop size 
distribution. 

To obviate to this inadequacy, we propose a parametrization of the drop 
size distribution based on a standard procedure in statistical science: the 
description of an unknown probability density function in terms of its mean 
/i, standard deviation a, skewness 7, and kurtosis k. In particular, we show 
that mean, standard deviation and skewness are the minimum number of 
parameters, which effectively describe the drop size distributions. These pa- 
rameters, together with the drop count N, are the variables necessary to de- 
scribe the rainfall phenomenon at a punctual scale in space and at short time 
scales. All bulk variables of interest can be derived by the 4-tuple of param- 
eters (N, fi, cr, 7). Adopting the jargon of the scientific community studying 
dynamical systems, we refer to the four dimensional Cartesian product R 4 
spanned by the 4-tuple (N, /i, a, 7) as the rainfall phase space. 

The work is organized as follows. In Section [2j we discuss the data used 
in our analysis, Section [3] we illustrate all the methods of analysis adopted. 
Section 0] reports our results, and Section our conclusions. 

2 Data 

We consider Joss-Waldvogel disdrometer data sampled at 1 minute time in- 
tervals from ten different locations on Earth's surface. Table 1 gives the list 
of the locations with a three letter code used to reference the site in the Fig- 
ures. In addition, the Kppen-Geiger climate classification ([13]) of each site 
is provided. Table 2 completes the description of the data sets providing for 
each site the latitude, longitude, altitude, together with number of minutes 
and drops considered in the analysis. 
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2.1 Data processing 

All data are processed as follows. We consider for each data base only minutes 
for which the drop count is >60 (drop arrival rate ~1 per second). The ratio- 
nale for this choice is twofold: A) It guarantees a minimum reasonable accu- 
racy for the estimation of the probability density function p(D), and the other 
statistical parameters (mean, standard deviation, ...). B) It excludes time 
intervals of observation which are quiescent (sparse precipitation) whose con- 
tribution to the total cumulated precipitated volume is negligible (see discus- 
sion in [7] ) . After this first filtering, we eliminate from the remaining minutes 
of observation any outlier drop count. For each minute, we find the disdrom- 
eter class with the maximum probability density, and we calculate the central 
continuous non-zero span of the probability density: the set of contiguous dis- 
drometer classes with non zero counts and which includes the class with maxi- 
mum density. The counts in all disdrometer classes which do not belong to the 
central continuous non-zero span are considered as outliers and are discarded: 
e.g. the disdrometer count (3,11,18,31,30,35,80,52,41,39,44,21,5,0,1,0,0,0,0,0) 
has an outlier in the 15-th class which is disregarded leading to the count 
(3,11,18,31,30,35,80,52,41,39,44,21, 

5,0,0,0,0,0,0,0). The removal of outliers drop counts improves the estimate 
of higher moments of the probability density function p(D). A detailed dis- 
cussion on outliers and their effects on estimated statistical parameters can 
be found in [5]. 

3 Methods 

We now describe the methods used to quantify the variability of the rainfall 
phenomenon. 

3.1 Variability of drop diameters 

Two equivalent descriptions are possible for the variability of drop diameters. 
1) The drop size distribution defined as the concentration per unit volume 
and unit diameter Af(D) 

AT(D) = N v f(D), (1) 
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where Ny is the number of drops per unit volume and f(D) is the probability 
density function of drop diameter in the unit volume. 2) The flux-equivalent 
of Eq.(ffJ): Ny —> N and f(D) — > p(D). N is the number of drops observed 
at the ground, and p(D) is the probability density function of drop diameter 
at the ground. The two descriptions are equivalent as 

P(D) = ^p^Nyf(D), (2) 

and 

oo 

N = A m TN v J v(D)f(D)dD. (3) 
o 

In the above equations, T is the time interval of observation (in seconds), 
A m the capture area of the instrument (in m 2 ), and v(D) the drop velocity 
of diameter D (in m/s). Usually it is assumed that the arrival velocity of 
drops is equal to their limit velocity and v(D) = CD b , where C = 3.78 m/(s 
mm 0,67 ) and b = 0.67 ([25]). With these definitions 

p(D) = ED^Nyf(D) E = (4) 

By use of Eq. (pE} , we can connect the moments M QiP of the probability density 
function observed at the ground with those of the concentration per unit 
volume M a jj: 

M at p = SM( a+0 .67),AT < ^=^ M a ^ = H _1 M( Q _ .67),p- (5) 

Using the above equation one is able to derive expressions for any rain bulk 
variable. E.g. the rainfall rate R (in mm/h) is 

67rl0 -4 

R = 67rl0- 4 CM3. 67 ^ = — — - NM^ p . (6) 

The concentration per unit volume, Eq.([T]) is by far the most common quan- 
titiy in literature, even if measurements of drop sizes at the ground are by 
large the only available data. Hereby, we adopt the probability density of 
drop diameter at the ground, Eq.©, as measured by disdrometer counts. 
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3.2 The Gamma distribution for M{D) and p(D) 

The most common functional form adopted for the probability density func- 
tion f(D) in Eq.([T]) is the Gamma distribution 

\k+i 

f(D) = fr{D, A, k) = __I^exp(-AI>), (7) 

where k is the shape, A the inverse scale parameter, and T(x) is Gamma 
function. Consequently the statistical moment of order a of the drop size 
distribution Af(D) is 

oo 

M a , N = N V J dDD a f r (D, A, k) = N v ^±^±^\~ a . (8) 
o 

However, it is common practice, in Literature, to write Eqs.([T]) and ([7]) as 

M(D) = N D k exp(-XD), (9) 



where N n is defined as 



\(fc+i) 



With the above notation 



M a , M = N T{k + a + l)A-( fc+Q+1 >. (11) 

Finally, if the probability density function f(D) is a Gamma distribution, 
then also the probability density at the ground p(D) is a Gamma distribution 

M{D) = NJ r (D, A, k) ^ p(D) = f r (D, X,k + 0.67). (12) 



3.3 Fitting the Gamma distribution to J\f(D) and p(D) 

We briefly review the two main methods which have been adopted in litera- 
ture to obtain the three parameters (Ny, A, k) in Eqs.([T]) and ([7]). 
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3.3.1 Method of the Moments (MM) 

The method of moments (MM) uses the moments of the observed distribution 
to derive the parameters of the desired fitting function. For a drop size 
distribution with a Gamma distribution for diameter density, the MM n i in2;n 3 
method finds the number of drops per unit volume Ny, the scale k and shape 
A such that the resulting distribution M{D) exactly matches the moments of 
order nl, n2 and n3 of the observed distribution. Hereby, we use the MM 3] 4 ;6 
and MM 2 ,3,4 procedures as they are the ones usually adopted in Literature 
(e.g. [23] and [22]), together with the MM .67,i.67,2.67 adopted by [20]. This 
last procedure finds the concentration Ny, the scale k and shape A of the 
drop size distribution M{D) matching the observed number of drops at the 
ground N (moment 0.67), the observed average drop diameter at the ground 
fi (moment 1.67), and the observed second moment of drop diameter at the 
ground M2 iP (moment 2.67). Note that the last two conditions imply that 
the resulting drop size distribution matches the standard deviation of drop 
diameter a observed at the ground. Therefore, we will use for brevity the 
notation MM^,^ instead of MM .67,i.67,2.67- 

Using Eqs. ([I]) and OH]), the parameters Ny, A, and k are 




G 




(13) 




(14) 




G 




(15) 
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3.3.2 Method of Maximum Likelihood (MML) 

The method of maximum likelihood is the main alternative to the method of 
moments. Let fx (x, 9) be the probability density function of the variable X 
given the vector of parameters 9, and x\, x, x^ is a sample of size N. The 
likelihood L(9) that the sample (xi, X2 t , ■ ■■,x n ) is drawn from the distribution 

TV 

fx (x, 9) is defined as L (9) = Ylfx (xi, 9). In practice it is more convenient 

i=l 

to deal with the logarithm of the likelihood, denominated the log-likelihood 

N 

In L (9) = In fx (xi, 9), or the average log-likelihood / (9) = In L {9). The 

i=l 

ML method makes an estimation of 9 maximizing the average log-likelihood, 

i.e. 9m ml = argmax [I (9)], [2]. Dealing with disdrometer data, it is impor- 
e 

tant to recall that disdrometers collect drops with a diameter D > D m i n = 0.3 
mm (for the JW disdrometer), making a lower truncation in the sample dis- 
tribution. In the absence of small drops in sample datasets, the method 
of maximum likelihood ignoring this problem exhibits large bias which do 
not decrease increasing the sample size, [11]. Consequently, modifications to 
the MML are necessary to deal explicitly situations where lower truncations 
to the samples are present. We will consider the lower truncated Gamma 
distribution which density is 



f(D) = f r (D,D mm ,X,k) = ^ 7(fc+i ; AD _ } ' - D > D ml! , (10) 



r(fe+i) 



D k exp(-XD) 



r(fc+i) 



where 7 (k + 1, \D min ) is the incomplete Gamma function calculated in \D, n 
According to [8], the average log-likelihood is 



I (A, k) = - In ( 1 - 7 (fc J" ; 1; AAm " 7 ) + (fc + 1) In A - lnr (fc + 1) + 




(17) 



The estimates of A and k are obtained numerically minimizing the function 
/ (A, k), using the R code provided by [S] in their appendix. 
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3.4 Statistical characterization of a probability distri- 
bution function 

Under some general condition (e.g. [IS]), a probability density function is 
completely determined by its moments: given the sequence of moments {Mj}, 
j = 1,2,3,... there exists an unique f(x) such that Mj = f x^f(x)dx. 
This fact has lead to the moment-characterization, in statistical sciences, 
of probability density function for which a known parametric form is not 
available. For this purpose a suitable number of moments and/or function 
of moments will provide information on the unknown probability density 
function f(x) and, as a consequence, on the dynamical process driving the 
realizations of the stochastic variable x. We refer to these parameters as the 
statistical descriptors of the probability density function. 

The two most commonly used statistical descriptors are the mean \i = Mi, 
and the standard deviation a = a/ M 2 — (Mi) 2 . In addition to these two pa- 
rameters, the skewness 7, measuring the asymmetry of the distribution, and 
the kurtosis k, measuring the peakedness of the distribution, are used. Skew- 
ness and kurtosis are the third and the fourth standardized central moments 
(the expectation value of [(x — /i)/o"] 3 and [(x — /i)/cr] 4 ) and can be written 
in terms of the moments of the distribution as follows: 

_ M 3 + 2(MQ 3 - 3M!M 2 
7 [M 2 - (MO 2 ] 3 / 2 1 j 

and 

_ M 4 - 3(MQ 4 + 6(Mx) 2 M 2 - 4M X M 3 

[M 2 -(M X ) 2 ] 2 ' [ ' 

Higher standardized moments do not have a particular name and are 
generally not used since the higher the moment the larger are the inaccuracies 
of any estimate. However, hereby we will make use of the fifth standardized 
central moment, the expectation values of [(x — fi)/a] 5 , which will denote 
with the letter 77 

= M 5 + 4(MQ 5 + 10(M 1 ) 2 M 3 - 10(M!) 3 M 2 - 5M 4 M X 
V ~ [M 2 - (Mx) 2 ] 5 /2 • t > 

Mean, standard deviation, skewness, and kurtosis are sufficient to achieve a 
satisfactory (albeit not full) description of any probability density function 
of interest. In many cases they are redundant as some of the statistical 
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descriptors are shown to be function of the others. For example if f(x) = 
fr(x, A, k) then only two elements of the 4-tuple (/i, <r, 7, k) are independent. 
This redundancy also occurs for rainfall as we will show in Section HI 



3.4.1 Phase space 

For a dynamical system, the term "phase space" indicates the Cartesian 
product R n of the n variables necessary to describe the system. Note that 
the cardinality (n) of a dynamical system is always larger or equal to its 
degrees of freedom. To describe the motion a point particle of mass m in 
one dimension we need its position x and its quantity of motion p (p = mv, 
with v =velocity). In this case the phase space is the xp plane (R 2 ), and at 
each time t the state of the particle is associated to a point in the xp plane. 
The time evolution of a dynamical system is described by the time evolution 
(trajectory) of its associated point in the phase space. 

In the case of the rainfall phenomenon, the variables of interest are bulk 
variables (e.g rainfall rate, reflectivity), which implies a "summation" over 
the number of drops in a given interval of time. All the bulk variables are 
functions of either the couple (N v , f(D)), if one uses the concentration per 
unit volume, or the couple (N,p(D)), if one uses the flux-equivalent descrip- 
tion. If the probability density function at the ground p(D) (or that in a uni- 
tary volume f(D)) has a parametric description: p(D) = p(D, 9%, $2, m ) 
in term of m parameters, then we define the phase space of rainfall as the 
Cartesian product R m+1 spanned by (m+l)-tuple (N, 61, 9 2 , 6 m ). E.g., if 
we consider the concentration N(D) and assume a Gamma distribution for 
the probability density function f(D) in Eq.([T]), the rainfall phase space is R 3 
spanned by the 3-tuple (Ny, A, k). Hereby, we do not impose any particular 
functional form on the function p(D) (and thus f(D)). We show (Section 
H|) that the parameters /1, a, and 7 are sufficient to describe the variability 
of the probability density function at the ground p{D). Therefore the Carte- 
sian product R 4 spanned by the 4-tuple (N, /x, a, 7) can be considered as the 
rainfall phase space. 

Any bulk variable B can be written as function of the phase space pa- 
rameters B = B(N, (jl, a, 7). In the case of the rainfall rate R (expressed in 
mm/h) 



6tt10 



-1 



R= — — N 



+ 3ficr 2 + a 3 7 



(21) 
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This equation states that the set of observation time intervals (1 minute in 
our case) with equal rainfall rate R is a three dimensional manifold in the 
phase space. 

3.5 Kolmogorov-Smirnov's goodness fit for probability 
distribution functions 

The Kolmogorov-Smirnov (K-S) goodness-of-fit test is a non-parametric test 
used to check if a sequence of random samples can be considered as a real- 
ization of a stochastic process with a given cumulative distribution function 
F(x). The test compares the hypothetical F (x) with the cumulative fre- 
quency Fn (x), where Fn (x) — if (N + 1) for x^) < x < xa + i), xu} is the 
i-th order statistics, and i = 1, .., N. The K-S uses as test statistic the max- 
imum difference = max\F(x) — F^ (x)\. If no parameter in F (x) is 
determined from data, then has a distribution which is independent by 
F(x). Thus the critical value of Dn for a significance level of 5% and for 
large samples, iV > 35, is 1.3581/A/iV, and reported in all statistical text- 
books (see e.g. [12J). Contrary, if the parameters of F (x) are estimated, then 
the distribution of D N is dependent on F(x), and the critical value of D N 
must be re-calculated, e.g. via Montecarlo simulations ([10]). The critical 
value of Dn re-calculated is always smaller than the value corresponding to 
the canonical case where it is assumed that "no parameter in F (x) is deter- 
mined from data". Disdrometer data report the occurrence of a drop in a 
given range of diameter values (diameter class) and not an "exact" diameter 
value which is needed to perform the Kolmogrov-Smirnov test. To bypass 
this limitation we assign to a drop in the j-th diameter class a random value 
selected uniformly in the range defined by the class itself. 

4 Results 

All the results reported in this Section refer to probability density function 
observed at the ground p{D). Similar results can be obtained for the concen- 
tration per unit volume and diameter N(D) since the two distribution are 
connected via Eq.([2]). 
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4.1 Measuring the adequacy of the Gamma distribu- 
tion 

The method of the moments MM nl n2 n 3 for the Gamma distribution finds 
the number of drops per unit volume N v , the scale k and shape A such that 
the fitting concentration per unit volume and unit diameter exactly matches 
the moments of order nl, n2 and n3 of the observed concentration per unit 
volume and unit diameter Af(D). None of the MM 2) 3 i 4, MM 3)4j6 , and MM N ^ a 
methods exactly match the moment 3.67, so that the "reproducibility" of the 
rainfall rate (observed versus the one derived from the fitted parameters) has 
been considered as a "measure" of fit goodness: e.g. Q23J) and [20]). Hereby 
, we show that accuracy with which the MM n i. n 2 in 3 method reproduces the 
j-th moment of the concentration N(D) cannot be taken as a measure of 
fit goodness as the accuracy depends on the separation between j and the 
orders nl,n2,n3 and not just on the particular functional form chosen. 

Let us consider the MM 3) 4 i6 method. The fitting distribution by construc- 
tion matches the third, fourth and sixth moment of J\f(D). Therefore it is 
not surprising that the 3.67-th moment of the fitting M{D) (the rainfall rate) 
is close to the observed value: the middle- left panel of Figure 1 indicates the 
relative error x is bounded in the range -0.5%, 0.5%. However, instead of 
a Gamma distribution for the functional form of f{D), one could use any 
other distribution with two parameters (e.g. Gaussian, Lognormal, Beta) 
and obtain similar accuracies. What about the number of drops observed at 
the ground N (the 0.67-th moment of Af(D))7 As shown in the middle-right 
panel of Figure 1, the agreement is not so good as relative error of the order 
of ±25% are possible. Next, we consider the MMjv iA1i(T method. It repro- 
duces the rainfall rate reasonably well, relative error bounded in the -5%, 5% 
range (upper- left panel of Figure 1), but poorly reproduces the reflectivity Z 
(the sixth moment of Af(D)) as relative errors larger than ±25% are not so 
uncommon (upper-right panel of Figure 1). Finally, the MM 2) 3 > 4 method re- 
produces with the same accuracy of the MM 3 4 6 method (extremely well) the 
rainfall rate, the relative error is bounded in the range -0.5%, 0.5% (bottom 
left panel of Figure 1), but is better than the MM3.4.6 method with respect 
the drop count iV (2 is closer to 0.67 than 3), relative error in the range 
-15%, 15% (bottom right panel of Figure 1). 

Figure 1 suggests that the Gamma distribution, as convenient and as 
parsimonious it may be, is not a satisfactory functional form for the drop 
size distribution. To prove this point we use a proper measure of goodness- 
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of- fit, such as the Kolmogorov-Smirnov test, and what is considered to be 
the best fitting procedure, the MML method. In Table 2, we report the 
percentage of acceptance (ACP) and rejection (RJC) of the lower truncated 
Gamma distribution with parameters estimated using the MML, using the 
Kolmogorov-Smirnov goodness-of-fit test with a 5% level of significance to 
each minute of the DRW data (6863). In columns 2 and 3, the percentages are 
calculated using as 5% critical value 1.3581/a//V, the classical value reported 
in all statistical textbooks (e.g. [12]) assuming that no parameters of the 
Gamma distribution are estimated. In columns 4 and 5 the critical value 
is determined via Montecarlo simulations taking into account the fact that 
the parameters are estimated via the MML from data. The first row (ALL) 
reports the fraction of the total number of minutes in the DRW data sets for 
which the Gamma distribution can be considered a good fit. The percentage 
of acceptance passes from 71% to 45% when one takes in proper consideration 
that the parameters of the distribution are obtained from the sample (|10j). 
The remaining rows report the percentage of acceptance and rejection for 
subsets obtained using as thresholds the percentiles of the distribution of 
number of drops iV per minute: 5%-, 25%-, 50%-, 75%-, 95%-percentile. 
E.g. the second row reports the results for the subsets with number of drops 
smaller than the 5%-percentile (N 5 % = 66 in our case): minutes with a 
small sample size. On the other hand, the last row reports the results for 
the subsets with number of drops larger than or equal to the 95%-percentile 
(-^95% — 1589 in our case): minutes with a large sample size . We see that 
when we move to subsets of minutes with large drop counts the percentage 
of acceptance (rejection) diminishes (increases). 

In summary, if we consider the entire DRW data set, we are confident (at 
the 95% level) that the Gamma distribution can be a proper fit for probabil- 
ity density function p(D) only for 45% of 1 minute sampling time intervals. 
More disturbingly the percentage of rejection increases as the sample size in- 
creases. Note that test like the Kolmogorov-Smirnov should be administered 
to sample with a size of at least ~100 ([12]) to have of any significance (in 
other words if the sample size is very small the effect of random fluctuations 
is large enough that almost any tested distribution will pass the test). Re- 
sults for the other nine data sets (no reported here for brevity) are similar 
to that of Darwin. On the ground of these results, we reject the Gamma 
distribution as a proper fit for drop size distributions. 
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4.2 Rainfall phase space 

We show that the value of standardized central moments of order >4 are 
strictly dependent from than the third one (skewness). In particular we study 
the dependence on the skewness 7 of the fourth standardized central moment 
(kurtosis k) and the fifth one denoted by the symbol 77. As a consequence, 
mean (/x), standard deviation (a), and skewness (7) provide a satisfactory 
description of the variability of the probability density function p(D) and 
therefore the 4-tuple (N, //, a, 7) can be considered as the rainfall phase space. 

4.2.1 higher standardized central moments 

To examine the dependence of the parameters k and 77 on 7, we calculate 
for each data set the median, 5%-, and 95%-percentile of the observed values 
of k and 7] for a given value of 7 (in practice this is accomplished dividing 
each data sets in subsets with "equal" (±0.08) value of skewness). The 
results are reported in Figure 2. The median values are depicted with solid 
lines of different colors one for each dataset. We also calculate, merging all 
the datasets, the 5% and 95% percentile for any given range value of the 
skewness if at least 100 samples (1 minute time interval of observation) are 
present. The range between the 5% and 95% percentile is shaded in gray 
in the figure. We see how in both cases (rj vs 7, central panel and k vs 7 
lower panel) the median lines are independent from the site chosen. The 
discrepancies observed for values of skewness larger than ~2.56 are mostly 
due to lack of statistics as shown in the upper panel of Figure 2, where we 
plot the number of sample M for each range value of the skewness. 

4.2.2 Relationship between phase space parameters 

Figure 2 shows that mean n, standard deviation a and skewness 7 are effective 
statistical descriptors for the probability density function of drop diameters 
at the ground p(D). Next, we show that these three parameters are not 
independent. We divide the range of value of the skewness in intervals of 
length 0.64 ([-1.60,-0.96],.. .,[2.88,3.52] as in E]), and the values of the 
mean diameter fi as follows: [0.3,0.4], [0.4,0.5], [0.5,0.6], [0.7,0.8], [0.9,1], 
[1,1.2], [1.2,1.4], [1.4,1.6], [1.6,1.8], [1.8,2.0], [2.0,2.5], and [2.5,3] (indicated 
by the vertical dashed lines in Figure 3). The rationale for these choices 
is to have inside each range of skewness and mean a "reasonable" number 
(^10) of 1 minute interval observations to calculate the median value of the 



14 



parameter a. The results are reported in Figure 3. We see how for value of 
the mean diameter less than 1 mm the median curves are approximatively 
linear and do not depend on the site of observation. In this case, the site 
average angular coefficient and average intercept are reported at the bottom 
right of each panel, if at least the median from 5 sites was available. We 
see how the slope (intercept) increases (decreases) with increasingly larger 
value of the skewness until a sort of plateau is reached for the skewness 
ranges [1.60,2.24] and [2.24,2.88] after which the slope decreases (intercept 
increases) again. For values of the mean diameter larger than 1 the median 
curves depend on the particular site of observation. However, the median 
curve is also estimated from a smaller number of samples (in the rangelO- 
100). Thus, with the present data, we cannot judge if the discrepancies 
between sites for the median curve (in the range fi > 1) are real properties 
of the rainfall phenomenon or merely artifacts due to the poor statistical 
sampling available. 

In summary, if fi is in the interval [0.3,1] then 

a = a( 7 ) + b(y)fi (22) 

where a is the intercept and b the slope. Note that if the Gamma distribution 
was indeed an extremely accurate fit to probability density function of drop 
diameter at the ground p(D) then a = 0.5/ry which is not supported by the 
experimental evidences depicted in Figure 3. Eq. (l22|) suggests that only two 
parameters of the triplets (/x, a, 7) are necessary to describe the probability 
density p(D) of drop diameters at the ground. Thus one could define the 
rainfall phase space as a tridimensional space: e.g. the Cartesian product of 
the 3-tuple (N,fi,j). However, longer data sets are necessary to effectively 
estimate the functions 0(7) and 6(7), and to explore the relationship a = 
0(7) + 6( 7 )/x in the range fi > 1mm. Therefore, for the purpose of this 
manuscript, we conservatively consider R 4 defined by the 4-tuple (N, fi, a, 7) 
as the rainfall phase space. 

4.2.3 Phase plots 

To each 1 minute time interval of observation is associated the point of coor- 
dinates (N, /i, a, 7) in the phase space. The entire data set occupies a volume 
in R 4 . To visualize this volume, we need to consider the six 2D projections: 
A* - iogioW, cr - log 10 (iV), 7 - log 10 (iV), n - cr, n - 7, and a - 7 (we use 
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log 10 (iV) instead of N for better visualization). Given a data set, we calcu- 
late the density of points in the phase space for all six 2D projections. Ten 
separate figures (one for each data set) would be necessary to illustrate the 
results. To obviate to this difficulty and give an idea to the reader of the 
differences/similarities between data sets we adopted the concept of average 
bounding perimeter. For all 2D projections we calculate the center of mass 
of the phase space points (each point has the same mass). With the center of 
mass as fixed point we span with an 10° angle step the plane of the 2D projec- 
tion. For each 10° cone we calculate the average distance from the center of 
mass of the points within the cone. The connection with a continuous line of 
all average distance creates the average bounding perimeter which is a "mea- 
surement" of the volume of the phase space occupied by the database. The 
results are shown in Figure 4. The plots on the \i — a projection plane show 
how the average bounding perimeters reflect the linear relationship between 
mean drop diameter and standard deviation of drop diameter depicted in 
Figure 3. The plots on the \i — log 10 (iV) plane projection indicate that large 
value of counts (log 10 (A r ) > 2.8) are reached (on average) only for values of 
the mean drop diameter which are small (fi in the range 0.4-0.7 mm) or large 
(/i > 1mm). The first case, many drops with small diameter, is a common 
feature of orographic precipitation as shown (e.g. [5], [6], [16], [T], and [1]). 
The BBY and CZC data sets are the same ones used in ([5], [6], and [16]) 
while BAO data set come from an instrument located at 1,577 m asml. The 
second case, many drops with possibly large diameters, is typical of strong 
convective events, the DRW data sets (rain of monsoonic origin) and the 
MIK (small island at the equator) and BKT (another equatorial site) are the 
data sets, among those considered, where the combination of large number 
of drops and large drop diameters occur more frequently on average. The av- 
erage bounding perimeter on the \x — 7 plane projection show that a decrease 
in value of the mean drop diameter is linked to a raise of the skewness value, 
although this may be in part an effect of the limitation of the instrument 
(Joss-Waldvogel impact disdrometer) which is not capable of detecting drop 
diameters smaller than 0.3 mm (reducing the contribution of left tails of p(D) 
to the skewness). Due to the approximate linear relationship between // and 
a, results of projection on planes for which one of the axis is the standard 
deviation o are similar to those for which axis is substituted by the mean 
/1. If we consider the result on the 7 — log 10 (iV) plane projection, we see 
how these two variable are quite uncorrelated as the shape of the average 
bounding perimeters do not suggest any particular relation. 
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4.2.4 Rain rate and phase space parameters 

The rainfall rate R aside from a multiplicative constant is the sum of three 
factors (Eq j2Tl) : Nfi 3 , 3Nfia 2 , and Na 3/ -f. Each factor accounts for a fraction 
a G [—1, 1] (negative values are possible only for the factor N<j 3/ y) of the 
rainfall rate. We calculate a for each factor and each 1 minute time interval 
of observation. Then we calculate the probability F(a) that the fraction does 
not exceed a. The results for each data base and each factor are shown in 
Figure 5. The F(a) curves are quite independent from the particular site. 
We see how the factor Nfi 3 contributes the most to the rainfall rate with a 
median contribution a m (F(a m ) = 0.5) which is in the range 0.7-0.75. The 
second largest contribution comes from the factor 3N[ia 2 , a m in the range 
0.2-0.25, while the factor Na 3r y accounts for the smallest contribution: a m in 
the range 0.025-0.05. These results can be explained noticing that at all sites 
and for all time interval of observation: 1) a < 1, and 2) a < jj,. So that 
3/iO" 2 is almost always smaller than /i 3 , and while 7 can be larger (in absolute 
value) than /x, a 3, y is always smaller than /j, 3 in virtue of 1) and 2). 

5 Conclusions 

When an objective measure of fit goodness is adopted, the Gamma distribu- 
tion provides a poor fit to the drop size distribution sampled at short time 
scale (1 minute in our case) at all the ten sites considered. It is the opinion of 
the Authors that only an objective criterion of fit goodness (e.g. Kolmogorov- 
Smirnov) should guide the choice of a particular functional form for the con- 
centration N(D) and/or the probability density function at the ground p(D). 
For this reason we reject the Gamma distribution as a proper parametriza- 
tion of the rainfall phenomenon. We propose an alternative parametrization 
based on the common statistical procedure of describing an unknown prob- 
ability density function in term of its standardized central moments. We 
show that the 4-tuple of parameters (N, /1, a, 7) is sufficient to describe the 
observed variability of disdrometer counts for all the ten sites considered, 
and refer to the Cartesian product of 4-tuple (N, fi, a, 7) as the rainfall phase 
space. The volumes in the phase space relative to each data base (Figure 
4) have some common features and some discrepancies which reflect differ- 
ent synoptic conditions and/or mechanisms of drop productions at play at 
the different site considered. However some results remarkably independent 
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from the site considered: 1) standardized central moments of order > 4 have 
strong deterministic relationship with the third standardized moment: the 
skewness. 2) mean /z, standard deviation a, and skewness 7 are related to 
each other via Eq.f l22|) with values of the slope 6(7) and intercept 0(7) which 
are not compatible with a Gamma distribution functional dependence. Fi- 
nally, bulk variables of the rainfall can be written as a function of 4-tuple 
(N, fi, a, 7). e.g. Eq. f[2"Tj) for the rainfall rate R. Bulk variables, such as the 
liquid water content W and the reflectivity Z are proportional to fractional 
moments (2.33 and 5.33 respectively) of the probability density function of 
drop diameter at the ground p{D). Therefore, analytical expressions for the 
variables W ans Z in terms of the 4-tuple (N, fi, cr.7) are necessarily approx- 
imations, on the contrary the Gamma distribution approximation leads to 
exact analytical expression. However, the main result of this paper is that 
any approximated expression, obtained via the proposed parametrization, is 
physically meaningful while any exact expression, obtained via the Gamma 
distribution parametrization, is not. 
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Table 1: List of the sites from which Joss-Waldvogel disdrometer data are 
considered, with a three letters code for short referral and the Koppen-Geiger 
climate classification. 



Site 


Code 


Koppen-Geiger climate clas- 
smcation 


Eire, Colorado (USA) 


BAO 


Snow climate, fully humid with 
warm summer 


Bodega Bay, California (USA) 


BBY 


Warm temperate climate with 
dry and warm summer 


Bukit Koto Tabang, Indonesia 


BKT 


Equatorial rain forest, fully hu- 
mid 


Chilbolton, United Kingdom 


CHB 


Warm temperate climate, fully 
humid with cool summer and cold 
winter 


Cazadero, California (USA) 


czc 


Warm temperate climate with 
dry and warm summer 


Darwin, Australia 


DRW 


Equatorial savannah with dry 
winter 


Hassel, Germany 


HSL 


Warm temperate climate, fully 
humid with warm summer 


Kashima, Japan 


KSH 


Warm temperate climate with 
dry and warm summer 


Macunaga, Italy 


MAC 


Snow climate, fully humid with 
cool summer and cold winter 


Kwajalein Atoll, RMI 


MIK 


Equatorial rainforest, fully humid 
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Table 2: List of the sites (short code referral) from which Joss-Waldvogel 
disdrometer data considered with latitude, longitude, altitude, number of 1 
minute time interval in data set, and total number of drops in the data set. 
The symbol (*) indicates quantities calculated after data sets are processed 



according to the 


procedure c 


escribed in Section 2.1. 


Code 


Long. 


Lat. 


Alt. (m) 




#drop* 


BAO 


40.05N 


105.00W 


1,577 


6,016 


2,349,280 


BBY 


38.20N 


123.00W 


12 


10,804 


5,389,240 


BKT 


0.12S 


100. 19E 


864 


68,389 


25,109,376 


CHB 


51.14N 


1.43W 


82 


29,122 


7,015,480 


CZC 


38.61N 


123.22W 


475 


76,137 


44,252,384 


DRW 


12.45S 


130.83E 


12 


6,863 


2,753,037 


HSL 


51.51N 


7.1E 


60 


26,402 


7,072,649 


KSH 


35.95S 


140.65E 


45 


68,570 


19,752,935 


MAC 


45.97N 


7.96E 


1,300 


9,956 


3,275,264 


MIK 


8.71N 


167.73W 


1 


20,170 


7,594,915 
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Table 3: Percentage of acceptance and rejection of the (lower truncated) 
Gamma distribution using the Kolmogorov-Smirnov goodness-of-fit test to 
each minute of the DRW data (6863) with a 5% level of significance. In 
columns 2 and 3 the critical value used is 1.3581 /^/N assuming that no pa- 
rameters of the Gamma distribution are estimated, while in columns 4 and 
5 the critical value is determined via Montecarlo simulations taking into ac- 
count the fact that the parameters of Gamma are estimated via the MML 
from data. The results refer to entire DRW dataset (first row), and to sub- 
sets obtained considering as thresholds, the percentiles of the distribution 
of number of drops N per minute (second tttlast row). The thresholds are 
N 5% = 66, N 2b% = 104, N 50% = 182, N 75% = 455, and N 95% = 1589. 





ACP 


RJC 


ACP 


RJC 


ALL 


71% 


29% 


45% 


55% 


< ^5% 


94% 


6% 


78% 


22% 


> ^5% 


70% 


30% 


44% 


56% 


< AWc 


93% 


7% 


71% 


29% 


> N 25% 


64% 


36% 


37% 


63% 


< N 50% 


91% 


9% 


65% 


35% 


> N 50% 


52% 


48% 


25% 


75% 


< N 75% 


84% 


16% 


56% 


44% 


> N 75% 


34% 


66% 


13% 


87% 


< N 95% 


74% 


26% 


47% 


53% 


> N 95% 


24% 


76% 


5% 


95% 
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BAO BBY BKT CHB CZC DRW HSL KSH MAC MIK 




Figure 1: The observed frequency F(x) of exceeding the relative error \ when 
calculating different bulk variables with the fitted Gamma distribution to the 
probability density function p(D) of drop diameters at the ground. Variables 
R and Z for the MMjv iAljCr method (upper panels), variables R and N for 
the MMa^^method (middle panels), and MM2,3,4 method (bottom panels). 
Different colors indicate different data bases. 
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BAO BBY BKT CHB CZC DRW HSL KSH MAC MIK 




-1.28 1.28 2.56 3.84 5.12 

Y 



Figure 2: The number of samples M (upper panel), the kurtosis k (mid- 
dlepanel), and the fifth standardized central moment 77 (lower panel)as a 
function of the skewness 7. Solid lines indicate the number of samples and 
the median values for the parameters k and rj. Gray shadowed areas repre- 
sent the 5%-, 95%-percentile range. Different colors indicate different data 
bases. 
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BAO BBY BKT CHB CZC DRW HSL KSH MAC MIK 
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Figure 3: The median values of the standard deviation o of drop diameters 
for different range value of the mean diameter \i (indicated by vertical dashed 
lines) and different range values of the skewness of drop diameter 7 (shown 
in the top left corner of each panel). Also shown in the bottom right corner 
of each panel is the site average slope and intercept of the median standard 
deviation curves in the interval /i G [0.3 : 1]. Different colors indicate different 
databases. 
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BAO BBY BKT CHB CZC DRW HSL KSH MAC MIK 
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Figure 4: Phase space 2D projections plots of the average bounding perimeter 
lines for each site of observation. Different colors indicate different data bases. 
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BAO BBY BKT CHB CZC DRW HSL KSH MAC MIK 




0.2 0.4 0.6 0.8 1 



a 

Figure 5: The probability F(a) of contributing a fraction a to the rainfall 
rate for the three factors Nfi 3 , 3Nfia 2 , and iV<7 3 7 of Eq. (l2T|) . Different colors 
indicate different data bases. 
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